NOTE: In plots, where there is “n=”, this figure refers to the total number of respondents in the row/column. This presentation is somewhat misleading and will be changed in future iterations.
employed <- readxl::read_xls(filepath, sheet ="S01.1", range ="F13:F24", col_names = F)area_names <- readxl::read_xls(filepath, sheet ="S01.1", range ="B13:B24",col_names = F)rgn_empl_denoms <-data.frame(area_names, employed) %>%mutate(across(where(is.numeric), ~.*1000)) # *1000 to get real numbercolnames(rgn_empl_denoms) <-c("Region","Employed")rgn_empl_denoms <- rgn_empl_denoms %>%mutate(Weight = Employed/sum(Employed) )
Using our working definition, how many of us could be described as outsourced?
Code
total_outsourced <- data %>%group_by(outsourcing_status) %>%summarise(Sum =sum(NatRepemployees) ) %>%mutate(Proportion = Sum /sum(Sum),Percentage =100* Proportion )readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced.csv")# Create function to find nearest denominator to express as a fraction.f <-function(x) ifelse(abs(1/floor(1/x) - x) <abs(1/ceiling(1/x) - x),floor(1/x),ceiling(1/x))
According to our definition, 1 in 6 UK workers are outsourced.
Based on this definition, we’ve found that just under 17% of UK workers are ‘outsourced’1. Who makes up this group of 17% of UK workers?
Code
total_outsourced <- data %>%group_by(outsourcing_group) %>%summarise(Sum =sum(NatRepemployees) ) %>%mutate(Proportion = Sum /sum(Sum),Percentage =100* Proportion )readr::write_csv(total_outsourced, file="../outputs/data/total_outsourced.csv")
In terms of the the different possible types of outsourced groups, the numbers are as follows:
To start with, we can create a generalised linear model predicting outsourcing status from the various demographics
create sectorname short - should add this to cleaning script
Code
library(forcats)data <- data %>%mutate(SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)) )# make region a factor and make the reference London - should be added to cleaning scriptdata <- data %>%mutate(Region = forcats::fct_relevel(factor(Region), "London") )
We construct two generalised linear models, both predicting outsourcing status. The first model has the following terms:
Age
Ethnicity
Arrival in UK
Gender
Region
Income
The second model additionally includes Sector.
Code
# quasibinomial because we're using weights that are non-integersmod_1 <-glm(outsourcing_status ~ Age + Ethnicity_collapsed + BORNUK_labelled + Gender + Region + income_annual, data, family ="quasibinomial", weights = NatRepemployees)summary(mod_1)mod_2 <-glm(outsourcing_status ~ Age + Ethnicity_collapsed + BORNUK_labelled + Gender + Region + income_annual + SectorName_short, data, family ="quasibinomial", weights = NatRepemployees)summary(mod_2)
SectorName_shortActivities of extraterritorial organisations and bodies
0.00
NA – 285798632.19
0.964
SectorName_shortActivities of households as employers
2.08
0.76 – 5.20
0.129
SectorName_shortAdministrative and support service activities
2.34
1.62 – 3.38
<0.001
SectorName_shortAgriculture, forestry and fishing
0.48
0.07 – 1.72
0.336
SectorName_shortArts, entertainment and recreation
0.83
0.47 – 1.40
0.500
SectorName_shortConstruction
1.22
0.82 – 1.81
0.329
SectorName_shortEducation
0.71
0.50 – 1.00
0.050
SectorName_shortElectricity, gas, steam and air conditioning supply
0.84
0.45 – 1.49
0.564
SectorName_shortFinancial and insurance activities
0.71
0.48 – 1.04
0.078
SectorName_shortHuman health and social work activities
0.87
0.64 – 1.17
0.351
SectorName_shortInformation and communication
0.98
0.68 – 1.40
0.896
SectorName_shortManufacturing
0.77
0.55 – 1.08
0.124
SectorName_shortMining and quarrying
0.00
NA – 184701.67
0.960
SectorName_shortNa
0.57
0.25 – 1.15
0.139
SectorName_shortNot found
1.79
0.71 – 4.21
0.194
SectorName_shortOther service activities
1.20
0.79 – 1.81
0.388
SectorName_shortProfessional, scientific and technical activities
0.96
0.65 – 1.41
0.844
SectorName_shortPublic administration and defence
0.60
0.41 – 0.87
0.007
SectorName_shortReal estate activities
0.68
0.34 – 1.28
0.262
SectorName_shortTransportation and storage
1.07
0.74 – 1.55
0.703
SectorName_shortWater supply
1.88
1.05 – 3.27
0.029
SectorName_shortWholesale and retail trade
0.80
0.59 – 1.08
0.140
Observations
7499
7498
R2 Tjur
0.046
0.058
The results show that a person is more likely to outsourced if they:
Are younger (compared to older)
Are African, South Asian or Arab (compared to White British)
Moved to the UK in last year, 10 years, 15 years, 20 years, or more than 30 years (compared to born in UK)
Are male (compared to female)
Have a (very very slightly) higher income (I don’t think this is a meaningful effect - the OR is basically 1!)
Note there are some significant effects of Sector, but without a sensible reference category these coefficients are not really interpretable. We should explore sector separately.
A person is less likely to be outsourced if they:
Are older
Are East Asian
Are female
Live in East of England, Scotland, South East, and South West (compared to London)
But maybe we should control for the working population in each area? But wouldn’t that just be perfectly correlated with Region anyway?
Age
Outsourcing status
Code
age_statistics <- data %>%group_by(outsourcing_status) %>%summarise(mean =weighted.mean(Age, w = NatRepemployees, na.rm = T),median =wtd.quantile(Age, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(Age, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(Age, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)) )readr::write_csv(age_statistics, file ="../outputs/data/age_stats.csv")
As shown in the table below, the median age of the outsourced group is 36 , compared to 43 for the not outsourced group.2
However, as the next figure shows, the age distribution is different for the outsourced and high indicator groups compared to the not outsourced and likely agency groups; the outsourced and high indicator groups have higher proportions of younger people (~21-36 year olds).
A t-test indicates that on average, outsourced workers are significantly younger than non-outsourced workers (t(2399.2) = 11.95, p = 0).
age_statistics_2 <- data %>%group_by(outsourcing_group) %>%summarise(mean =weighted.mean(Age, w = NatRepemployees, na.rm = T),median =wtd.quantile(Age, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(Age, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(Age, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(Age, w = NatRepemployees, na.rm = T)) )readr::write_csv(age_statistics_2, file ="../outputs/data/age_stats_2.csv")
Exploring the age distribution for the different outsourced groups, the high density concentration of slightly younger workers identified above appears to be driven primarily by the ‘outsourced’ and ‘high indicator’ groups. The ‘likely agency’ group follows a similar pattern, but has a lower density peak than the other groups, with a higher density of workers of more advanced ages.
ethnicities <-as.vector(unique(haven::as_factor(data$Ethnicity)))non_white_ethnicities <- ethnicities[!(ethnicities %in%"English / Welsh / Scottish / Northern Irish / British")]# Will throw NA warning. I think this OK but investigate how to avoid the problemsummary_table <- data %>%mutate(Ethnicity = haven::as_factor(Ethnicity) ) %>%mutate(Ethnicity = forcats::fct_collapse(as.character(Ethnicity),"White British"=c("English / Welsh / Scottish / Northern Irish / British"),"Non-White British"= non_white_ethnicities) ) %>%group_by(outsourcing_status, Ethnicity) %>%summarise(n =n() ) %>%mutate(Sum =sum(n),Percentage =100* (n / Sum) )group_1 <-t(tibble("present"=summary_table[which(summary_table["Ethnicity"]=="White British"& summary_table["outsourcing_status"]=="Outsourced"),"n"],"not present"= summary_table[which(summary_table["Ethnicity"]=="Non-White British"& summary_table["outsourcing_status"]=="Outsourced"),"n"]))group_2 <-t(tibble("present"=summary_table[which(summary_table["Ethnicity"]=="White British"& summary_table["outsourcing_status"]=="Not outsourced"),"n"],"not present"= summary_table[which(summary_table["Ethnicity"]=="Non-White British"& summary_table["outsourcing_status"]=="Not outsourced"),"n"]))comp_mat <-as.matrix(cbind(group_2, group_1)) # matrix for crosstablex2 <- gmodels::CrossTable(comp_mat, fisher=TRUE)# `r if(x2[["chisq"]][["p.value"]] < .001, "< .001", paste0("= ", round(x2[["chisq"]][["p.value"]],2)))`).# (chi-square = `r round(x2[["chisq"]][["statistic"]][["X-squared"]],2)`, *p* = `r round(x2[["chisq"]][["p.value"]],3)`).
Breaking down by ethnicity shows that the outsourced group has a lower proportion of White workers compared to the non-outsourced group. For example, in the outsourced group, the proportion of British (‘White’) workers is 66.91 %, compared to 78.01% in the not outsourced group. Needless to say, this means that there is a correspondingly higher proportion of workers from minority backgrounds in the outsourced group, notably from African (4.1%) and other White backgrounds (5.5, amongst others.4 These differences mean that outsourced workers are 1.87 times more likely to be a member of minority ethnicity than non-outsourced workers.
Call:
glm(formula = outsourcing_status ~ Age + Gender + income_group +
Ethnicity_short, family = "binomial", data = test_data)
Coefficients:
Estimate
(Intercept) -1.122623
Age -0.024661
GenderMale 0.531731
GenderOther 0.128987
GenderPrefer not to say -0.240452
income_groupLow 0.276827
Ethnicity_shortAfrican 0.750804
Ethnicity_shortAny other Asian background 0.691922
Ethnicity_shortAny other Black, Black British, or Caribbean background 0.752002
Ethnicity_shortAny other ethnic group 0.220800
Ethnicity_shortAny other Mixed 0.673451
Ethnicity_shortAny other White background 0.171750
Ethnicity_shortArab 0.989803
Ethnicity_shortBangladeshi 0.703248
Ethnicity_shortCaribbean 0.183752
Ethnicity_shortChinese 0.021374
Ethnicity_shortDon’t think of myself as any of these 1.449163
Ethnicity_shortGypsy or Irish Traveller 0.282974
Ethnicity_shortIndian 0.548848
Ethnicity_shortIrish -0.126226
Ethnicity_shortPakistani 0.864916
Ethnicity_shortPrefer not to say -0.006321
Ethnicity_shortRoma 1.082635
Ethnicity_shortWhite and Asian -0.090237
Ethnicity_shortWhite and Black African 1.155726
Ethnicity_shortWhite and Black Caribbean -0.279236
Std. Error
(Intercept) 0.111181
Age 0.002392
GenderMale 0.061348
GenderOther 0.780590
GenderPrefer not to say 1.056849
income_groupLow 0.061893
Ethnicity_shortAfrican 0.125846
Ethnicity_shortAny other Asian background 0.258347
Ethnicity_shortAny other Black, Black British, or Caribbean background 0.330047
Ethnicity_shortAny other ethnic group 0.647274
Ethnicity_shortAny other Mixed 0.264347
Ethnicity_shortAny other White background 0.135112
Ethnicity_shortArab 0.483591
Ethnicity_shortBangladeshi 0.286337
Ethnicity_shortCaribbean 0.300048
Ethnicity_shortChinese 0.319975
Ethnicity_shortDon’t think of myself as any of these 0.659571
Ethnicity_shortGypsy or Irish Traveller 0.843433
Ethnicity_shortIndian 0.163023
Ethnicity_shortIrish 0.292763
Ethnicity_shortPakistani 0.188944
Ethnicity_shortPrefer not to say 0.633334
Ethnicity_shortRoma 0.769140
Ethnicity_shortWhite and Asian 0.335327
Ethnicity_shortWhite and Black African 0.262938
Ethnicity_shortWhite and Black Caribbean 0.345023
z value
(Intercept) -10.097
Age -10.311
GenderMale 8.667
GenderOther 0.165
GenderPrefer not to say -0.228
income_groupLow 4.473
Ethnicity_shortAfrican 5.966
Ethnicity_shortAny other Asian background 2.678
Ethnicity_shortAny other Black, Black British, or Caribbean background 2.278
Ethnicity_shortAny other ethnic group 0.341
Ethnicity_shortAny other Mixed 2.548
Ethnicity_shortAny other White background 1.271
Ethnicity_shortArab 2.047
Ethnicity_shortBangladeshi 2.456
Ethnicity_shortCaribbean 0.612
Ethnicity_shortChinese 0.067
Ethnicity_shortDon’t think of myself as any of these 2.197
Ethnicity_shortGypsy or Irish Traveller 0.336
Ethnicity_shortIndian 3.367
Ethnicity_shortIrish -0.431
Ethnicity_shortPakistani 4.578
Ethnicity_shortPrefer not to say -0.010
Ethnicity_shortRoma 1.408
Ethnicity_shortWhite and Asian -0.269
Ethnicity_shortWhite and Black African 4.395
Ethnicity_shortWhite and Black Caribbean -0.809
Pr(>|z|)
(Intercept) < 2e-16
Age < 2e-16
GenderMale < 2e-16
GenderOther 0.868753
GenderPrefer not to say 0.820021
income_groupLow 7.73e-06
Ethnicity_shortAfrican 2.43e-09
Ethnicity_shortAny other Asian background 0.007400
Ethnicity_shortAny other Black, Black British, or Caribbean background 0.022698
Ethnicity_shortAny other ethnic group 0.733011
Ethnicity_shortAny other Mixed 0.010847
Ethnicity_shortAny other White background 0.203669
Ethnicity_shortArab 0.040680
Ethnicity_shortBangladeshi 0.014049
Ethnicity_shortCaribbean 0.540267
Ethnicity_shortChinese 0.946741
Ethnicity_shortDon’t think of myself as any of these 0.028011
Ethnicity_shortGypsy or Irish Traveller 0.737246
Ethnicity_shortIndian 0.000761
Ethnicity_shortIrish 0.666357
Ethnicity_shortPakistani 4.70e-06
Ethnicity_shortPrefer not to say 0.992037
Ethnicity_shortRoma 0.159252
Ethnicity_shortWhite and Asian 0.787852
Ethnicity_shortWhite and Black African 1.11e-05
Ethnicity_shortWhite and Black Caribbean 0.418329
(Intercept) ***
Age ***
GenderMale ***
GenderOther
GenderPrefer not to say
income_groupLow ***
Ethnicity_shortAfrican ***
Ethnicity_shortAny other Asian background **
Ethnicity_shortAny other Black, Black British, or Caribbean background *
Ethnicity_shortAny other ethnic group
Ethnicity_shortAny other Mixed *
Ethnicity_shortAny other White background
Ethnicity_shortArab *
Ethnicity_shortBangladeshi *
Ethnicity_shortCaribbean
Ethnicity_shortChinese
Ethnicity_shortDon’t think of myself as any of these *
Ethnicity_shortGypsy or Irish Traveller
Ethnicity_shortIndian ***
Ethnicity_shortIrish
Ethnicity_shortPakistani ***
Ethnicity_shortPrefer not to say
Ethnicity_shortRoma
Ethnicity_shortWhite and Asian
Ethnicity_shortWhite and Black African ***
Ethnicity_shortWhite and Black Caribbean
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 7918.6 on 8739 degrees of freedom
Residual deviance: 7567.2 on 8714 degrees of freedom
(203 observations deleted due to missingness)
AIC: 7619.2
Number of Fisher Scoring iterations: 4
Another way of looking at this is to calculate, for each ethnicity, the proportion of workers in each outsourcing group. Doing so yields the plot below.6
# test <- multinom(outsourcing_group ~ Ethnicity_collapsed, data, weights = NatRepemployees)# summary(test)# # z <- summary(test)$coefficients/summary(test)$standard.errors# z# # p <- (1 - pnorm(abs(z), 0, 1)) * 2# p# # # Assuming your dataframe is named 'p'# p_2 <- apply(p, 2, function(x) ifelse(x < 0.01, 1, NA))# # sig_ors <- exp(summary(test)$coefficients * p_2)# we can take the results of this forward and plot the ors
bornuk_statistics <- data %>%# get values of labelsmutate_all(haven::as_factor) %>%group_by(outsourcing_status, BORNUK) %>%summarise(Frequency =n() ) %>%mutate(Sum =sum(Frequency),Percentage =100* (Frequency / Sum) )readr::write_csv(bornuk_statistics, file="../outputs/data/arrival_in_UK_stats.csv")
Code
categories <-as.vector(unique(haven::as_factor(data$BORNUK)))non_categories <- categories[!(categories %in%"I was born in the UK")]# Will throw NA warning. I think this OK but investigate how to avoid the problemsummary_table <- data %>%mutate(BORNUK = haven::as_factor(BORNUK) ) %>%mutate(BORNUK = forcats::fct_collapse(as.character(BORNUK),"Born in UK"="I was born in the UK","Not born in UK"= non_categories) ) %>%group_by(outsourcing_status, BORNUK) %>%summarise(n =n() ) %>%mutate(Sum =sum(n),Percentage =100* (n / Sum) )domain <-"BORNUK"category_1 <-"Born in UK"category_2 <-"Not born in UK"group_1 <-t(tibble("present"=summary_table[which(summary_table[domain]==category_1 & summary_table["outsourcing_status"]=="Outsourced"),"n"],"not present"= summary_table[which(summary_table[domain]==category_2 & summary_table["outsourcing_status"]=="Outsourced"),"n"]))group_2 <-t(tibble("present"=summary_table[which(summary_table[domain]==category_1 & summary_table["outsourcing_status"]=="Not outsourced"),"n"],"not present"= summary_table[which(summary_table[domain]==category_2 & summary_table["outsourcing_status"]=="Not outsourced"),"n"]))comp_mat <-as.matrix(cbind(group_2, group_1)) # matrix for crosstablex2 <- gmodels::CrossTable(comp_mat, fisher=TRUE, chisq =TRUE)# (chi-square = `r round(x2[["chisq"]][["statistic"]][["X-squared"]],2)`, *p* = `r ifelse(x2[["chisq"]][["p.value"]] < .001, "< .001", paste0("= ", round(``x2[["chisq"]][["p.value"]],2))`).
A greater proportion of outsourced workers were not born in the UK (24.06%) compared to non-outsourced workers (13.6%).7 This difference is statistically significant; outsourced workers are 2.01 times more likely to have been born outside the UK than non-outsourced workers.
Looking at the figure below, it appears that no particular arrival time is especially common amongst the outsourced group, with a relatively equal distribution across arrival times (though potentially a slightly larger proportion fall into the ‘Within the last 10 years category’). The is broadly the case for the likely agency and high indicators groups too, though note that amongst likely agency there is a slightly larger proportion of workers who have arrived within the last year.
Call:
glm(formula = outsourcing_status ~ income_group * BORNUK_labelled,
family = "binomial", data = data, weights = NatRepemployees)
Coefficients:
Estimate Std. Error
(Intercept) -1.80357 0.04344
income_groupLow 0.22862 0.06439
BORNUK_labelledWithin the last year 1.28978 0.25307
BORNUK_labelledWithin the last 3 years 0.68932 0.23063
BORNUK_labelledWithin the last 5 years 0.54565 0.24814
BORNUK_labelledWithin the last 10 years 0.74566 0.18405
BORNUK_labelledWithin the last 15 years 1.05016 0.18554
BORNUK_labelledWithin the last 20 years 0.99745 0.20348
BORNUK_labelledWithin the last 30 years -1.33247 0.53396
BORNUK_labelledMore than 30 years ago 0.43879 0.23670
BORNUK_labelledPrefer not to say 0.84084 0.44469
income_groupLow:BORNUK_labelledWithin the last year -0.38239 0.35255
income_groupLow:BORNUK_labelledWithin the last 3 years -0.41813 0.39148
income_groupLow:BORNUK_labelledWithin the last 5 years 0.12118 0.42110
income_groupLow:BORNUK_labelledWithin the last 10 years 0.24893 0.28633
income_groupLow:BORNUK_labelledWithin the last 15 years -0.95025 0.38000
income_groupLow:BORNUK_labelledWithin the last 20 years -0.84991 0.43888
income_groupLow:BORNUK_labelledWithin the last 30 years 1.75447 0.69940
income_groupLow:BORNUK_labelledMore than 30 years ago 0.54466 0.35281
income_groupLow:BORNUK_labelledPrefer not to say -0.12256 0.63991
z value Pr(>|z|)
(Intercept) -41.516 < 2e-16 ***
income_groupLow 3.551 0.000384 ***
BORNUK_labelledWithin the last year 5.096 3.46e-07 ***
BORNUK_labelledWithin the last 3 years 2.989 0.002800 **
BORNUK_labelledWithin the last 5 years 2.199 0.027883 *
BORNUK_labelledWithin the last 10 years 4.051 5.09e-05 ***
BORNUK_labelledWithin the last 15 years 5.660 1.51e-08 ***
BORNUK_labelledWithin the last 20 years 4.902 9.49e-07 ***
BORNUK_labelledWithin the last 30 years -2.495 0.012579 *
BORNUK_labelledMore than 30 years ago 1.854 0.063776 .
BORNUK_labelledPrefer not to say 1.891 0.058647 .
income_groupLow:BORNUK_labelledWithin the last year -1.085 0.278074
income_groupLow:BORNUK_labelledWithin the last 3 years -1.068 0.285484
income_groupLow:BORNUK_labelledWithin the last 5 years 0.288 0.773525
income_groupLow:BORNUK_labelledWithin the last 10 years 0.869 0.384622
income_groupLow:BORNUK_labelledWithin the last 15 years -2.501 0.012396 *
income_groupLow:BORNUK_labelledWithin the last 20 years -1.937 0.052802 .
income_groupLow:BORNUK_labelledWithin the last 30 years 2.509 0.012123 *
income_groupLow:BORNUK_labelledMore than 30 years ago 1.544 0.122644
income_groupLow:BORNUK_labelledPrefer not to say -0.192 0.848114
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8150.5 on 8942 degrees of freedom
Residual deviance: 7991.6 on 8923 degrees of freedom
(1212 observations deleted due to missingness)
AIC: 8937.4
Number of Fisher Scoring iterations: 5
Code
# To me this indicates that htere is main effect - arrival time presjPlot::plot_model(mod, type ="int", legend.title ="")
Results of a glm suggest that any arrival time positively predicts outsroucing status, apart from ‘within the last 15 years’ and ‘within the last 30 years’. Takeaway is that people having migrated in the past 20 years are more likely to be outsourced than people born in the uk. People having migrated in the past 15 years are less likely to be outsourced if they’re in the low income group, whilst people having migrated in the past 30 years are more likely to be outsourced if they’re in the low income group. I would take caution in interpreting these interaction results in isolation as they may be influence by other factors (e.g., ethnicity).
Note
We should test this with a more complex model that includes covariates
The plot below shows the percentage of outrouced and non-outsourced people by income group and arrival time.
Code
temp_data <- bornuk_summary_paysplit %>%drop_na(income_group)for(group inunique(temp_data$income_group)){ plot_data <- temp_data %>%filter(income_group==group) plot <- plot_data %>%ggplot(., aes(BORNUK_labelled, Percentage, fill = outsourcing_status)) +geom_col(colour="black", position =position_dodge()) +#annotate("text", x = ethnicity_statistics$outsourcing_status, y = 75, label = paste0("n=",ethnicity_statistics$Sum)) +coord_flip() +scale_fill_manual(values = many_colours, name ="Ethnicity") +xlab("Outsourcing group") +theme_minimal() +theme(legend.position ="bottom" ) +ggtitle(paste0(group, " income"))print(plot)}
data <- data %>%mutate(BORNUK_collapsed = forcats::fct_collapse(BORNUK_labelled,"Born in UK"="I was born in the UK","Came to UK recently"=c("Within the last year","Within the last 3 years","Within the last 5 years","Within the last 10 years"),"Came to UK not recently"=c("Within the last 15 years","Within the last 20 years","Within the last 30 years","More than 30 years ago"),"Prefer not to say"=c("Prefer not to say") ) )int_summary_3 <- data %>%group_by(outsourcing_status, Ethnicity_collapsed, BORNUK_collapsed) %>%summarise(Frequency =sum(NatRepemployees) ) %>%mutate(Percentage =100* (Frequency/sum(Frequency)) )int_summary_3 %>%ggplot(., aes(Ethnicity_collapsed, Percentage, fill = BORNUK_collapsed)) +facet_grid(rows=vars(outsourcing_status)) +geom_col() +coord_flip() +scale_fill_manual(values = many_colours)
Code
# mod <- glm(outsourcing_status ~ Ethnicity_collapsed*BORNUK_collapsed, data, family="binomial", weight = NatRepemployees)# summary(mod)# emmeans(mod, specs = "Ethnicity_collapsed", by ="BORNUK_collapsed")# sjPlot::plot_model(mod, type = "int", legend.title = "", terms = c("outsourcing_status","BORNUK_collapsed","Ethnicity_collapsed ['Black African]"))#
gender_statistics <- data %>%# get values of labelsmutate_all(haven::as_factor) %>%group_by(outsourcing_status, Gender) %>%summarise(Frequency =n() ) %>%mutate(Percentage =100* (Frequency /sum(Frequency)) )readr::write_csv(gender_statistics, file="../outputs/data/gender_statistics.csv")
Code
gender_statistics %>%ggplot(., aes(outsourcing_status, Percentage, fill = Gender)) +geom_col(colour="black") +# annotate("text", x = gender_statistics$outsourcing_status, y = 75, label = paste0("n=", gender_statistics$Frequency)) +coord_flip() +scale_fill_manual(values=colours) +theme_minimal() +xlab("Outsourcing group")
Code
categories <-as.vector(unique(haven::as_factor(data$Gender)))non_categories <- categories[!(categories %in%"Male")]# Will throw NA warning. I think this OK but investigate how to avoid the problemsummary_table <- data %>%mutate(Gender = haven::as_factor(Gender) ) %>%mutate(Gender = forcats::fct_collapse(as.character(Gender),"Male"="Male","Not male"= non_categories) ) %>%group_by(outsourcing_status, Gender) %>%summarise(n =n() ) %>%mutate(Sum =sum(n),Percentage =100* (n / Sum) )domain <-"Gender"category_1 <-"Male"category_2 <-"Not male"group_1 <-t(tibble("present"=summary_table[which(summary_table[domain]==category_1 & summary_table["outsourcing_status"]=="Not outsourced"),"n"],"not present"= summary_table[which(summary_table[domain]==category_2 & summary_table["outsourcing_status"]=="Not outsourced"),"n"]))group_2 <-t(tibble("present"=summary_table[which(summary_table[domain]==category_1 & summary_table["outsourcing_status"]=="Outsourced"),"n"],"not present"= summary_table[which(summary_table[domain]==category_2 & summary_table["outsourcing_status"]=="Outsourced"),"n"]))comp_mat <-as.matrix(cbind(group_2, group_1)) # matrix for crosstablex2 <- gmodels::CrossTable(comp_mat, fisher=TRUE, chisq =TRUE)
In terms of Gender, the outsourced group has a larger proportion of males (57.81% compared to 46.4%). This difference is statistically significant; outsourced workers are 1.58 times more likely to have be male than non-outsourced workers.
Call:
glm(formula = outsourcing_status ~ Gender * income_group, family = "binomial",
data = data, weights = NatRepemployees)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.01977 0.06791 -29.741 < 2e-16
GenderMale 0.56260 0.08168 6.888 5.67e-12
GenderOther 0.44378 0.96305 0.461 0.6449
GenderPrefer not to say 0.65109 0.69214 0.941 0.3469
income_groupLow 0.43123 0.08751 4.928 8.33e-07
GenderMale:income_groupLow -0.24070 0.11969 -2.011 0.0443
GenderOther:income_groupLow -0.10606 1.38352 -0.077 0.9389
GenderPrefer not to say:income_groupLow -0.65321 1.10050 -0.594 0.5528
(Intercept) ***
GenderMale ***
GenderOther
GenderPrefer not to say
income_groupLow ***
GenderMale:income_groupLow *
GenderOther:income_groupLow
GenderPrefer not to say:income_groupLow
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 8150.5 on 8942 degrees of freedom
Residual deviance: 8076.3 on 8935 degrees of freedom
(1212 observations deleted due to missingness)
AIC: 9001.4
Number of Fisher Scoring iterations: 4
Code
sjPlot::plot_model(mod, type ="int")
A glm finds that income group is a significant factor for females but not for males. Whilst males are more likely than females overall to be outsourced, females are significantly more likely to be outsourced if they are in the low income group than if they are in the not-low income group. The plots below show the percentage of outsourced workers by income group and gender.
Code
temp_data <- gender_summary_paysplit %>%drop_na(income_group)for(group inunique(temp_data$income_group)){ plot_data <- temp_data %>%filter(income_group==group) plot <- plot_data %>%ggplot(., aes(Gender, Percentage, fill = outsourcing_status)) +geom_col(colour="black", position =position_dodge()) +#annotate("text", x = ethnicity_statistics$outsourcing_status, y = 75, label = paste0("n=",ethnicity_statistics$Sum)) +coord_flip() +scale_fill_manual(values = many_colours, name ="Ethnicity") +xlab("Outsourcing group") +theme_minimal() +theme(legend.position ="bottom" ) +ggtitle(paste0(group, " income"))print(plot)}
Let’s cross check the size of the employed workforce across regions, and compare this to how many people are in each region in our sample. The percentages should work out the same if they’re weighted.
The tables below show that our sample is weighted by region. The weighted percentage of our sampled workers in each region matches the percentages from the ONS employment by region tables. This means that the weighted percentage of workers (and therefore outsourced workers) in our sample can be considered to be representative of the national picture.
The plot below shows the distribution of outsourced and non outsourced workers across regions. It suggests that an outsourced worker is more likely to be based in London than a non-outsourced worker.
In the plot below the percentages have been scaled to the size of the working population in the region as a function of the total working population in the UK. I need to check whether this scaling is actually necessary, given we are already using weighted data.11 Does the weighting process account for region?
Below we calculate the number of outsourced workers within each region.
Code
region_statistics_2 <- data %>%# get values of labels# mutate_all(haven::as_factor) %>%group_by(Region, outsourcing_status) %>%summarise(Frequency =sum(NatRepemployees) ) %>%mutate(Sum =sum(Frequency),Percentage =100* (Frequency / Sum) ) %>%rename(`Outsourcing status`= outsourcing_status )region_statistics_2 %>%ggplot(., aes(Region, Percentage, fill =`Outsourcing status`)) +geom_col(colour="black") +coord_flip() +scale_fill_manual(values=many_colours) +theme_minimal()
As we can see, London has the highest proportion of outsourced workers (25%). After London, the regions with the highest proportion of outsourced workers are:
# filter to just cases where income is abovve the fifth percentile and lower than the 95th? I.e., drop the top and bottom 5%.income_statistics <- data %>%filter(income_drop_all ==0&!is.na(income_annual_all)) %>%group_by(outsourcing_status) %>%summarise(mean =weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T),median =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )knitr::kable(income_statistics, digits =2, col.names =c("Outsourcing group","Mean","Median","Min","Max","Standard dev.")) %>%kable_styling(full_width = F)
# plot the distribution of income for the two groupsdata %>%filter(income_drop_all ==0&!is.na(income_annual_all)) %>%ggplot(., aes(outsourcing_status, income_annual_all)) +geom_violin() +geom_boxplot(width =0.3) +geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_status, y =6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", income_statistics$median), nudge_x =0.1, hjust=0) +coord_cartesian(xlim=c(1,2.5)) +theme_minimal() +xlab("Outsourcing status") +ylab("Annual income") +coord_cartesian(ylim =c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) +scale_y_continuous(breaks =seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))
The distribution for the different outsourcing groups is shown below. It indicates that income is particularly low for the ‘outsourced’ and ‘likely agency’ workers, whilst average income for the ‘high indicators’ workers is notably higher. This means that, were we not to consider the high indicators group, the difference in income between outsrouced and non-outsourced workers would be larger.
Code
income_statistics <- data %>%filter(income_drop_all ==0&!is.na(income_annual_all)) %>%group_by(outsourcing_group) %>%summarise(n =n(),mean =weighted.mean(income_annual_all, w = NatRepemployees, na.rm = T),median =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(.5), na.rm = T),min =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(0), na.rm = T),max =wtd.quantile(income_annual_all, w = NatRepemployees, probs =c(1), na.rm = T),stdev =sqrt(wtd.var(income_annual_all, w = NatRepemployees, na.rm = T)) )data %>%filter(income_drop_all ==0&!is.na(income_annual_all)) %>%ggplot(., aes(outsourcing_group, income_annual_all)) +geom_violin() +geom_boxplot(width =0.3) +geom_text(inherit.aes=F, data=income_statistics, aes(outsourcing_group, y =6e+04), label=paste0("Mean = ", round(income_statistics$mean,0),"\n", "Median = ", round(income_statistics$median,0),"\n N = ", income_statistics$n), nudge_x =0.1, hjust=0) +coord_cartesian(xlim=c(1,2.5)) +theme_minimal() +xlab("Outsourcing group") +ylab("Annual income") +coord_cartesian(ylim =c(plyr::round_any(min(income_statistics$min), 5000, f = floor),plyr::round_any(max(income_statistics$max),5000, f = ceiling))) +scale_y_continuous(breaks =seq(plyr::round_any(min(income_statistics$min), 5000, f = ceiling), plyr::round_any(max(income_statistics$max),5000, f = ceiling), 10000))
Although the average pay between non-outsourced and outsourced workers looks similar, a t-test finds that there is a marginally significant difference; outsourced workers are on average paid less than non-outsourced workers (t(1511.07) = 3.97, p = 0).
Below, we run a linear regression testing whether the relationship between outsourcing status and annual income is influenced by income group (not low vs low), controlling for age, gender, ethnicity, and region. We do indeed find a significant interaction effect. The figure below plots this.
Code
test <-lm(income_annual ~ Age + Gender + Ethnicity + Region + outsourcing_status*income_group, data, weights = NatRepemployees)# summary(test)emmeans(test, specs ="outsourcing_status", by ="income_group")
income_group = Not low:
outsourcing_status emmean SE df lower.CL upper.CL
Not outsourced 62912 13148 7360 37139 88686
Outsourced 90189 13564 7360 63598 116779
income_group = Low:
outsourcing_status emmean SE df lower.CL upper.CL
Not outsourced 23036 13200 7360 -2840 48912
Outsourced 21766 13846 7360 -5375 48908
Results are averaged over the levels of: Gender, Region
Confidence level used: 0.95
Code
sjPlot::plot_model(test, type ="int")
The results here indicate that among workers that are not paid below our low pay threshold, an outsourced worker can typically be expected to earn considerably more than a non-outsourced worker, whereas among workers that are paid below our low pay threshold, an outsourced worker can typically be expected to be paid the same as a non-outsourced worker (maybe slightly less). Of note is that for both pay groups, the variance in pay is higher for the outsourced groups.
Sector
Sector within outsourcing status
This framing shows how outsourced and non-outsourced workers are distributed across sectors.
Code
sector_summary <- data %>%group_by(outsourcing_status, SectorName, SectorName_labelled) %>%summarise(Frequency =sum(NatRepemployees),avg_income =mean(income_annual, na.rm=T),wtd_avg_income =weighted.mean(income_annual, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(outsourcing_status) %>%mutate(Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )readr::write_csv(sector_summary, "../outputs/data/sector_summary.csv")
The plot below shows the sector breakdown by outsourcing status. I.e. this is how outsourced and not outsourced workers are distributed across sectors.13
Note
With this framing we could say things like “as an outsourced worker, you are x times more likely to work in than a non-outsourced worker”
ACTIVITIES OF EXTRATERRITORIAL ORGANISATIONS AND BODIES
3
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US
4
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
5
AGRICULTURE, FORESTRY AND FISHING
6
ARTS, ENTERTAINMENT AND RECREATION
7
CONSTRUCTION
8
EDUCATION
9
ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY
10
FINANCIAL AND INSURANCE ACTIVITIES
11
HUMAN HEALTH AND SOCIAL WORK ACTIVITIES
12
INFORMATION AND COMMUNICATION
13
MANUFACTURING
14
MINING AND QUARRYING
15
Not found
16
OTHER SERVICE ACTIVITIES
17
PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES
18
PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY
19
REAL ESTATE ACTIVITIES
20
TRANSPORTATION AND STORAGE
21
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
22
WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES
The table below shows what percentage of outsourced and non-outsourced workers work in each sector, as well as the difference between them (positive numbers in the difference column indicate sectors that are more common for outsourced work, negative numbers indicate sectors that are less common for outsourced work).14
It indicates that sectors that are less common for outsourced workers compared to not outsourced are:
PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY
EDUCATION
And sectors that are more common for outsourced workers compared to not outsourced are:
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
For these sectors that differ most in the concentration of the outsourced workforce, there is a pattern (if three data points can be called that) whereby in sectors with a relatively higher concentration of outsourced workers, outsourced workers are paid less, and in sectors with a relatively lower concentration of outsourced workers, outsourced workers are paid more. This is tenuous, but it is an example of the heterogeneity in income between sectors, and should be explored further.
The plot below plots the percentage difference in the concentration of outsourced vs non-outsourced workers (i.e. the difference between what proportion of workers of each type are in each sector) against the income difference for that sector (i.e., the difference in the average income between groups). Note that a statistical test of this relationship shows it is non-significant. This plot therefore only serves as an illustration of where workers are situated in terms of sector and pay. A key takeaway here is that there is considerable variation in the difference in pay between outsourced and non-outsourced workers. There also appears to be a central area where the concentration of outsourced vs non-outsourced workers is quite similar, but the pay for outsourced workers is lower. This might indicate sectors where employment of outsourced workers is as common as employment of non-outsourced workers, but where outsourced workers are paid less than non-outsourced workers. These sectors are:
5: AGRICULTURE, FORESTRY AND FISHING
6: ARTS, ENTERTAINMENT AND RECREATION
9: ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY
This framing shows how sectors are composed, i.e., what proportion of workers in each sector are outsourced vs non-outsourced.
Code
sector_summary_3 <- data %>%group_by(SectorName, SectorName_labelled, outsourcing_status) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_annual, na.rm=T),wtd_avg_income =weighted.mean(income_annual, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(SectorName) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )write_csv(sector_summary_3, file="../outputs/data/sector_summary_3.csv")
The plot below shows the proportion of outsourced and not outsourced workers within each sector. I.e. this is showing what sectors have higher and lower proportions of outsourced workers.
Note
With this framing we could say things like “sector a is x times more likely to employ outsourced workers than sector b”
The table below shows the percentage of outsourced workers in each Sector, ordered descending by percentage. It shows that the top three Sectors with the highest proportion of outsourced workers are:
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US (note that N = 31)
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
Note that for an undefined sector (‘Not found’) contained one of the largest proportions of outsourced workers (31% of workers in the ‘Not found’ category were outsourced).
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US
35.652378
4
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES
31.570055
16
Not found
31.317619
22
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
30.008923
17
OTHER SERVICE ACTIVITIES
21.102417
7
CONSTRUCTION
20.589291
21
TRANSPORTATION AND STORAGE
19.415919
18
PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES
19.369714
12
INFORMATION AND COMMUNICATION
19.034279
1
ACCOMMODATION AND FOOD SERVICE ACTIVITIES
18.738635
9
ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY
18.008232
10
FINANCIAL AND INSURANCE ACTIVITIES
16.529705
23
WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES
16.373830
11
HUMAN HEALTH AND SOCIAL WORK ACTIVITIES
16.037739
6
ARTS, ENTERTAINMENT AND RECREATION
15.255060
13
MANUFACTURING
14.939669
20
REAL ESTATE ACTIVITIES
13.504099
8
EDUCATION
13.065534
19
PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY
10.051123
5
AGRICULTURE, FORESTRY AND FISHING
9.709408
Exploring this workforce makeup in the context of income shows that there are some sectors where outsourced workers are paid more and some where they are paid less than non-outsourced workers. The plot below visualises this.
Sectors where outsourced workers are paid less:
ACTIVITIES OF HOUSEHOLDS AS EMPLOYERS; UNDIFFERENTIATED GOODS-AND SERVICES-PRODUCING ACTIVITIES OF HOUSEHOLDS FOR OWN US (note that N = 31)
ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES (note that N = 32)
AGRICULTURE, FORESTRY AND FISHING
ARTS, ENTERTAINMENT AND RECREATION
ELECTRICITY, GAS, STEAM AND AIR CONDITIONING SUPPLY
PUBLIC ADMINISTRATION AND DEFENCE; COMPULSORY SOCIAL SECURITY
REAL ESTATE ACTIVITIES
WATER SUPPLY; SEWERAGE, WASTE MANAGEMENT AND REMEDIATION ACTIVITIES
Sectors where outsourced workers are paid more:
ACCOMMODATION AND FOOD SERVICE ACTIVITIES
CONSTRUCTION
EDUCATION
INFORMATION AND COMMUNICATION
MANUFACTURING
Not found
OTHER SERVICE ACTIVITIES
PROFESSIONAL, SCIENTIFIC AND TECHNICAL ACTIVITIES
REAL ESTATE ACTIVITIES
Note that in 2 or 3 of the Sectors where outsourced workers are paid less are low-paying Sectors. (this needs to be double-checked)
The percentages below show be read as e.g. 20% of low paid workers in accommodation and food services are outsourced, compared to 16% of not low paid workers in accommodation and food services.
sector_summary <- data %>%group_by(outsourcing_group, SectorName, SectorName_labelled) %>%summarise(n =n(),Frequency =sum(NatRepemployees),avg_income =mean(income_annual, na.rm=T),wtd_avg_income =weighted.mean(income_annual, w = NatRepemployees, na.rm=T) ) %>%ungroup() %>%group_by(outsourcing_group) %>%mutate(N =sum(n),Sum =sum(Frequency),perc =100* (Frequency/Sum),SectorName_labelled =case_when(SectorName_labelled =="NA"~NA,TRUE~ SectorName_labelled),SectorName_short = SectorName_labelled ) %>%# make the sector names more readableseparate_wider_delim(SectorName_short, names =c("SectorName_short", "SectorName_short_detail"), delim=";",too_few ="align_start") %>%mutate(SectorName_short =factor(stringr::str_to_sentence(SectorName_short)),SectorName_short_detail =factor(stringr::str_to_sentence(SectorName_short_detail)), )
The plot below shows the distribution of sectors within each outsourcing group. The standout differences are:
Greater proprotion of HUMAN HEALTH AND SOCIAL WORK ACTIVITIES in the ‘likely agency’ category, compared to other groups
Greater proprotion of CONSTRUCTION in the ‘likely agency’ category, compared to other groups
Greater proportion of ADMINISTRATIVE AND SUPPORT SERVICE ACTIVITIES in the ‘outsourced’ category, compared to other groups
Greater proportion of WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES in the ‘high indicators’ category, compared to other groups
(Really, this plot is better for showing the makeup of each type of outsroucing group - comparisons aer better made comparing outsroucing group within sectors. Here is a better way of interpeting these plots):
For the high indicator group, the sector with the largest proprotion of workers was WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES, closely followed by HUMAN HEALTH AND SOCIAL WORK ACTIVITIES.
For the likely agency group, the sector with the largest proprotion of workers was HUMAN HEALTH AND SOCIAL WORK ACTIVITIES.
For the outsourced group, the sector with the largest proprotion of workers was WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES, closely followed by HUMAN HEALTH AND SOCIAL WORK ACTIVITIES.
Note that also for the not outsourced group, the sector with the largest proprotion of workers was HUMAN HEALTH AND SOCIAL WORK ACTIVITIES, closely followed by WHOLESALE AND RETAIL TRADE; REPAIR OF MOTOR VEHICLES AND MOTORCYCLES.
[This might say something about the demographics of the people who were sampled in this survey]
Call:
glm(formula = outsourcing_status ~ Age + Gender + income_group +
Ethnicity_short, family = "binomial", data = test_data)
Coefficients:
Estimate
(Intercept) -1.122623
Age -0.024661
GenderMale 0.531731
GenderOther 0.128987
GenderPrefer not to say -0.240452
income_groupLow 0.276827
Ethnicity_shortAfrican 0.750804
Ethnicity_shortAny other Asian background 0.691922
Ethnicity_shortAny other Black, Black British, or Caribbean background 0.752002
Ethnicity_shortAny other ethnic group 0.220800
Ethnicity_shortAny other Mixed 0.673451
Ethnicity_shortAny other White background 0.171750
Ethnicity_shortArab 0.989803
Ethnicity_shortBangladeshi 0.703248
Ethnicity_shortCaribbean 0.183752
Ethnicity_shortChinese 0.021374
Ethnicity_shortDon’t think of myself as any of these 1.449163
Ethnicity_shortGypsy or Irish Traveller 0.282974
Ethnicity_shortIndian 0.548848
Ethnicity_shortIrish -0.126226
Ethnicity_shortPakistani 0.864916
Ethnicity_shortPrefer not to say -0.006321
Ethnicity_shortRoma 1.082635
Ethnicity_shortWhite and Asian -0.090237
Ethnicity_shortWhite and Black African 1.155726
Ethnicity_shortWhite and Black Caribbean -0.279236
Std. Error
(Intercept) 0.111181
Age 0.002392
GenderMale 0.061348
GenderOther 0.780590
GenderPrefer not to say 1.056849
income_groupLow 0.061893
Ethnicity_shortAfrican 0.125846
Ethnicity_shortAny other Asian background 0.258347
Ethnicity_shortAny other Black, Black British, or Caribbean background 0.330047
Ethnicity_shortAny other ethnic group 0.647274
Ethnicity_shortAny other Mixed 0.264347
Ethnicity_shortAny other White background 0.135112
Ethnicity_shortArab 0.483591
Ethnicity_shortBangladeshi 0.286337
Ethnicity_shortCaribbean 0.300048
Ethnicity_shortChinese 0.319975
Ethnicity_shortDon’t think of myself as any of these 0.659571
Ethnicity_shortGypsy or Irish Traveller 0.843433
Ethnicity_shortIndian 0.163023
Ethnicity_shortIrish 0.292763
Ethnicity_shortPakistani 0.188944
Ethnicity_shortPrefer not to say 0.633334
Ethnicity_shortRoma 0.769140
Ethnicity_shortWhite and Asian 0.335327
Ethnicity_shortWhite and Black African 0.262938
Ethnicity_shortWhite and Black Caribbean 0.345023
z value
(Intercept) -10.097
Age -10.311
GenderMale 8.667
GenderOther 0.165
GenderPrefer not to say -0.228
income_groupLow 4.473
Ethnicity_shortAfrican 5.966
Ethnicity_shortAny other Asian background 2.678
Ethnicity_shortAny other Black, Black British, or Caribbean background 2.278
Ethnicity_shortAny other ethnic group 0.341
Ethnicity_shortAny other Mixed 2.548
Ethnicity_shortAny other White background 1.271
Ethnicity_shortArab 2.047
Ethnicity_shortBangladeshi 2.456
Ethnicity_shortCaribbean 0.612
Ethnicity_shortChinese 0.067
Ethnicity_shortDon’t think of myself as any of these 2.197
Ethnicity_shortGypsy or Irish Traveller 0.336
Ethnicity_shortIndian 3.367
Ethnicity_shortIrish -0.431
Ethnicity_shortPakistani 4.578
Ethnicity_shortPrefer not to say -0.010
Ethnicity_shortRoma 1.408
Ethnicity_shortWhite and Asian -0.269
Ethnicity_shortWhite and Black African 4.395
Ethnicity_shortWhite and Black Caribbean -0.809
Pr(>|z|)
(Intercept) < 2e-16
Age < 2e-16
GenderMale < 2e-16
GenderOther 0.868753
GenderPrefer not to say 0.820021
income_groupLow 7.73e-06
Ethnicity_shortAfrican 2.43e-09
Ethnicity_shortAny other Asian background 0.007400
Ethnicity_shortAny other Black, Black British, or Caribbean background 0.022698
Ethnicity_shortAny other ethnic group 0.733011
Ethnicity_shortAny other Mixed 0.010847
Ethnicity_shortAny other White background 0.203669
Ethnicity_shortArab 0.040680
Ethnicity_shortBangladeshi 0.014049
Ethnicity_shortCaribbean 0.540267
Ethnicity_shortChinese 0.946741
Ethnicity_shortDon’t think of myself as any of these 0.028011
Ethnicity_shortGypsy or Irish Traveller 0.737246
Ethnicity_shortIndian 0.000761
Ethnicity_shortIrish 0.666357
Ethnicity_shortPakistani 4.70e-06
Ethnicity_shortPrefer not to say 0.992037
Ethnicity_shortRoma 0.159252
Ethnicity_shortWhite and Asian 0.787852
Ethnicity_shortWhite and Black African 1.11e-05
Ethnicity_shortWhite and Black Caribbean 0.418329
(Intercept) ***
Age ***
GenderMale ***
GenderOther
GenderPrefer not to say
income_groupLow ***
Ethnicity_shortAfrican ***
Ethnicity_shortAny other Asian background **
Ethnicity_shortAny other Black, Black British, or Caribbean background *
Ethnicity_shortAny other ethnic group
Ethnicity_shortAny other Mixed *
Ethnicity_shortAny other White background
Ethnicity_shortArab *
Ethnicity_shortBangladeshi *
Ethnicity_shortCaribbean
Ethnicity_shortChinese
Ethnicity_shortDon’t think of myself as any of these *
Ethnicity_shortGypsy or Irish Traveller
Ethnicity_shortIndian ***
Ethnicity_shortIrish
Ethnicity_shortPakistani ***
Ethnicity_shortPrefer not to say
Ethnicity_shortRoma
Ethnicity_shortWhite and Asian
Ethnicity_shortWhite and Black African ***
Ethnicity_shortWhite and Black Caribbean
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 7918.6 on 8739 degrees of freedom
Residual deviance: 7567.2 on 8714 degrees of freedom
(203 observations deleted due to missingness)
AIC: 7619.2
Number of Fisher Scoring iterations: 4
Another way of looking at this is to calculate, for each ethnicity, the proportion of workers in each outsourcing group. Doing so yields the plot below.18
Comparison of Majorgroupcode indicates that a higher proportion of outsourced people work in Elementary Occupations, compared to non-outsourced people. A lower proportion of outsourced people work in administrative and secretarial occupations, associate professional occupations and professional occupations.
Code
mgc_summary <- data %>%group_by(outsourcing_status,Majorgroupcode) %>%summarise(Frequency =sum(NatRepemployees),avg_income =mean(income_annual, na.rm=T),wtd_avg_income =weighted.mean(income_annual, w = NatRepemployees, na.rm=T) ) %>%mutate(Sum =sum(Frequency),perc =100* (Frequency/Sum) )readr::write_csv(mgc_summary, "../outputs/data/majorgroupcode_summary.csv")
Code
mgc_summary %>%ggplot(aes(outsourcing_status, perc, fill =as.factor(Majorgroupcode))) +geom_col() +coord_flip() +scale_fill_manual(values=many_colours)
Code
mgc_key <-data.frame("number"=seq(1,11,1),"Major group code"=c( levels(haven::as_factor(mgc_summary$Majorgroupcode)),NA))mgc_key %>%kable() %>%kable_styling(full_width = F)
number
Major.group.code
1
ADMINISTRATIVE AND SECRETARIAL OCCUPATIONS
2
ASSOCIATE PROFESSIONAL OCCUPATIONS
3
CARING, LEISURE AND OTHER SERVICE OCCUPATIONS
4
ELEMENTARY OCCUPATIONS
5
MANAGERS, DIRECTORS AND SENIOR OFFICIALS
6
NA
7
PROCESS, PLANT AND MACHINE OPERATIVES
8
PROFESSIONAL OCCUPATIONS
9
SALES AND CUSTOMER SERVICE OCCUPATIONS
10
SKILLED TRADES OCCUPATIONS
11
NA
The table below shows the percentage of outsourced and non-outsourced workers in each majorgroupcode, as well as the difference between them (positive numbers in the difference column indicate occupations that are more common for outsourced work, negative numbers indicate occupations that are less common for outsourced work).19
The plot below summarises the average pay (x-axis) in each occupation (y-axis) for outsourced and non-outsourced workers (dot colour), as well as the size of the respective workforce (size of dots). Here the size of the dot represents the percentage of workers within the sector who are outsourced (blue) or not outsourced (purple).20
It shows, as might be expected, the size of the outsourced workforce for each sector is smaller than the non-outsourced workforce, but the ratio is not the same for all sectors. The sector with the largest non-outsourced:outsourced ratio is Elementary occupations; for every 10 non-outsourced workers, there are 4 outsourced workers. This is followed by caring, leisure, and other service occupations, and process, plant and machine operatives, both of which employ 2 outsourced workers for every 10 non-outsourced workers.
Notably, in elementary occupations and sales and customer service occupations, outsourced workers are on average paid more than non-outsourced workers. In contrast, workers in process, plant and machine operations are paid less if they are outsourced than if they are not outsourced.
Unit occupations
A deep dive into elementary occupations, process occupations, and caring occupations reveals that there are differences between occupations in the size of the outsourced workforce and pay.